A reverberation-time-aware deep-neural-network (DNN)-based multi-channel speech dereverberation framework is\nproposed to handle a wide range of reverberation times (RT60s). There are three key steps in designing a robust\nsystem. First, to accomplish simultaneous speech dereverberation and beamforming, we propose a framework,\nnamely DNNSpatial, by selectively concatenating log-power spectral (LPS) input features of reverberant speech from\nmultiple microphones in an array and map them into the expected output LPS features of anechoic reference speech\nbased on a single deep neural network (DNN). Next, the temporal auto-correlation function of received signals at\ndifferent RT60s is investigated to show that RT60-dependent temporal-spatial contexts in feature selection are needed\nin the DNNSpatial training stage in order to optimize the system performance in diverse reverberant environments.\nFinally, the RT60 is estimated to select the proper temporal and spatial contexts before feeding the log-power\nspectrum features to the trained DNNs for speech dereverberation. The experimental evidence gathered in this study\nindicates that the proposed framework outperforms the state-of-the-art signal processing dereverberation algorithm\nweighted prediction error (WPE) and conventional DNNSpatial systems without taking the reverberation time into\naccount, even for extremely weak and severe reverberant conditions. The proposed technique generalizes well to\nunseen room size, array geometry and loudspeaker position, and is robust to reverberation time estimation error.
Loading....